25 research outputs found

    Efficient parallel computation on workstation clusters

    Get PDF
    We present novel hard- and software that efficiently implements communication primitives for parallel execution on Workstation clusters. We provide low communication latencies, minimal protocol, zero operating system overhead, and high throughput. With this technology, it is possible to build effective parallel systems using off-the-shelf workstations. Our goal is to develop a standard interfaceboard and the necessary software for interfacing any number of computers, from a workstation to a cabinet full of workstation-boards

    Latency hiding in parallel systems: a quantitative approach

    Get PDF
    In many parallel applications, network latency causes a dramatic loss in processor utilization. This paper examines software pipelining as a technique for network latency hiding. It quantifies the potential improvements with detailed,instruction-level simulations. The benchmarks used are the Livermore Loop kernels and BLAS Level 1. These were parallelized and run on the instruction-level RISC simulator DLX, extended with both a blocking and a pipelined network. Our results show that prefetch in a pipelined network improves performance by a factor of 2 to 9, provided the network has sufficient bandwidth to accept at least 10 requests per processor

    PSPVM: implementing PVM on a high-speed interconnect for workstation clusters

    Get PDF
    PSPVM in an implementation of the PVM package on top of ParaStations high-speed interconnent for workstation clusters. The ParaStation system uses user level communication for message exchange and removes the operating system from the critical path of message transmission. ParaStations user interface consists of a user-level socket emulation. Thus, we need only minor changes to the standard PVM package to get it running on the ParaStation system. Throughput of the PSPVM is increased eight times and latency is reduced by a factor of four compared to regular PVM. The remaining latency is mainly (88%88\%) caused by the PVM package itself. The underlying sockets are so fast (25μ\mus) that the PVM package is the limiting factor. PSPVM offers nearly the raw performance of the network to the user and is object-code compatible to regular PVM. As a consequence, we achieve an application speed-up of four to six over traditional PVM using regular ethernet on a cluster of workstations

    The ParaPC/ParaStation project: efficient parallel computing by clustering workstations

    Get PDF
    ParaStation is a communications fabric for connecting off-the-shelf workstations into a supercomputer. The fabric employs technology used in massively parallel machines and scales up to 4096 nodes. The message passing software preserves the low latency of the fabric by taking the operating system out of the communication path, while still providing full protection. The first implementation of ParaStation using Digital\u27s AlphaGeneration workstations achieves end-to-end (process-to-process) latencies as low as 2.5 us and a sustained bandwidth of more than 10 MByte/s per channel with small packets. Benchmarks using PVM on ParaStation demonstrate real application performance of 1 GFLOP on an 8-node cluster

    Prefetching on the Cray-T3E: a model and its evaluation

    Get PDF
    In many parallel applications, network latency causes a dramatic loss in processor utilization. This paper examines software controlled access pipelining (SCAP) as a technique for hiding network latency. An analytic model of SCAP briefly describes basic operation techniques and performance improvements. Results are quantified with benchmarks on the Cray-T3E. The benchmarks used are Jacobi-iteration, parts of the Livermore Loop kernels, and others representing six different parallel algorithm classes. These were parallelized and optimized by hand to show the performance tradeoff of severals pipelining techniques. Our results show that SCAP on the Cray-T3E improves performance compared to a blocking execution by a factor of 2.1 to 38. It also got a performance speed-up against HPF of at least 12% to a factor of 3.1 dependent on the algorithm class

    PSPVM: Implementing PVM on a high-speed Interconnect for Workstation Clusters

    No full text
    . PSPVM in an implementation of the PVM package on top of ParaStations high-speed interconnent for workstation clusters. The ParaStation system uses user level communication for message exchange and removes the operating system from the critical path of message transmission. ParaStations user interface consists of a user-level socket emulation. Thus, we need only minor changes to the standard PVM package to get it running on the ParaStation system. Throughput of the PSPVM is increased eight times and latency is reduced by a factor of four compared to regular PVM. The remaining latency is mainly (88%) caused by the PVM package itself. The underlying sockets are so fast (25¯s) that the PVM package is the limiting factor. PSPVM offers nearly the raw performance of the network to the user and is object-code compatible to regular PVM. As a consequence, we achieve an application speed-up of four to six over traditional PVM using regular ethernet on a cluster of workstations. 1 Introduction ..

    The ParaStation Project: Using Workstations as Building Blocks for Parallel Computing

    No full text
    The ParaStation communication fabric provides a high-speed communication network with user-level access to enable efficient parallel computing on workstation clusters. The architecture, implemented on off-the-shelf workstations coupled by the ParaStation communication hardware, removes the kernel and common network protocols from the communication path while still providing full protection in a multiuser, multiprogramming environment. The programming interface presented by ParaStation consists of a UNIX socket emulation and widely used parallel programming environments like PVM, P4, and MPI. This allows porting a wide range of client/server and parallel applications to the ParaStation architecture. The first implementation of ParaStation using Digital 's AlphaGeneration workstations achieves a communication latency as low as 2:5¯s (process-to-process) and a sustained bandwidth of more than 10 Mbyte/s per process. Benchmarks using PVM on ParaStation demonstrate real application performa..

    Using Workstations as Building Blocks for Parallel Computing

    No full text
    The key to efficient parallel computing on workstations clusters is a communication subsystem that removes the operating system from the communication path and eliminates all unnecessary protocol overhead. At the same time, protection and a stable multi-user, multiprogrammed environment cannot be sacrificed. We have developed a communication subsystem, called ParaStation2, which fulfills these requirements. Its one-way latency is 14:5¯s to 18¯s (depending on the hardware platform) and throughput is 65 to 90 MByte/s, which compares well with other approaches. We were able to achieve an application performance of 5.3 GFLOP running a matrix multiplication on 8 DEC Alpha machines (21164A, 500 MHz). ParaStation2 offers standard programming interfaces, including PVM, MPI, Unix sockets, Java sockets, and Java RMI. These interfaces allow parallel applications to be ported to ParaStation2 with minimal effort. The system is implemented on a variety of platforms, including DEC Alpha workstations ..
    corecore